Randomization’s Role
in Research Study Design

A presentation example for teaching non-statistics students with no previous university-level statistics or math courses. Designed specifically for the interview process for the Department Statistical Science at the University of Toronto.

SharYIkEs!

Let’s get some Shark Attack data from data.world!

tidyverse

  • We’re using tidyverse again(!)
    • for it’s efficient, explicit modern data processing capabilities
# https://www.tidyverse.org
# install.packages("tidyverse")
# install.packages("kableExtra")
# remotes::install_github('rstudio/rmarkdown')

library(tidyverse)
library(kableExtra)

Flossing

  • We always have to do careful data cleaning
  • Here’s what I had to do to pull out the months
    • and exclude data that didn’t have a month reported
    • (important momentarily!)
# https://data.world/shruti-prabhu/shark-attacks
# could also download this directly from the web, here:
# https://query.data.world/s/lsa57qir23vdwphqusrq2nk3d4hkxt
# probably best to download it though if we're gonna work with it for a while!

# 0. https://stackoverflow.com/questions/<below>
# 1. 14363085/invalid-multibyte-string-in-read-csv
# 2. 15564063/apostrophe-turning-into-x92
# 3. 2014069/windows-1252-to-utf-8-encoding
# 4. 56479923/how-do-you-set-encoding-fileencoding-option-in-readr-tidyverse
read_csv("attacks.csv", locale=locale(encoding="WINDOWS-1252")) %>% 
  select(-`Case Number_1`) %>% 
  select(-`Case Number_2`) %>% 
  filter(Country=='USA') %>%
  filter(nchar(Date)>5) %>%
  filter(Date!='1853 or 1854') %>%
  filter(Date!="1900-1905") %>%
  filter(Date!="1898-1899") %>%
  filter(str_detect(Date,'Ca.', negate=TRUE)) %>%
  filter(str_detect(Date,'Circa', negate=TRUE)) %>%
  filter(str_detect(Date,'Before', negate=TRUE)) %>%
  filter(str_detect(Date,'No date', negate=TRUE)) %>%
  filter(str_detect(Date,'Early', negate=TRUE)) %>%
  mutate(Date = str_replace(Date, '--', '-')) %>%
  mutate(Date = str_replace(Date, 'Jan 1858', 'Jan-1858')) %>%
  mutate(Date = str_replace(Date, 'Aug-24-1806', '24-Aug-1806')) %>%
  mutate(Date = str_replace(Date, 'May-17-1803', '17-May-1803')) %>%
  mutate(Date = str_replace(Date, 'July', 'Jul')) %>%
  mutate(Date = str_replace(Date, 'Sept', 'Sep')) %>%
  mutate(Date = str_replace(Date, 'Sep or ', '25-')) %>%
  mutate(Date = str_replace(Date, 'Mid ', '25-')) %>%
  mutate(Date = str_replace(Date, 'Late ', '05-')) %>%
  mutate(Date = str_replace(Date, 'Summer ', '01-Jul-')) %>%
  mutate(Date = str_replace(Date, 'Reported ', '')) %>%
  mutate(Date = str_replace(Date, '     ', '')) %>%
  mutate(Date = if_else(nchar(Date)==6, paste0('01-',Date), Date)) %>%
  mutate(Date = if_else(nchar(Date)==8, paste0('01-',Date), Date)) %>%
  # https://tidyr.tidyverse.org/reference/separate.html
  separate(Date, c('year','month','day'), sep='-', remove=FALSE) %>%
  drop_na(month) -> sharks

Readibilty

  • Look how human readable this tidy code is!
    • I use select, filter, and (especially) mutate all the time
    • but separate is a new cool feature I’ve never used before
    • read_csv and drop_na are old standards
    • Watch out for character encoding!
Case Number Date year month day Year Type Country Area Location Activity Name Sex Age Injury Fatal (Y/N) Time Species Investigator or Source pdf href formula href original order
2017.06.10.a 10-Jun-17 10 Jun 17 2017 Unprovoked USA Florida Ponce Inlet, Volusia County Surfing Bryan Brock M 19 Laceration to left foot N 10h00 NA Daytona Beach News-Journal, 6/10/2017 2017.06.10.a-Brock.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2017.06.10.a-Brock.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2017.06.10.a-Brock.pdf 6093
2017.06.04 04-Jun-17 04 Jun 17 2017 Unprovoked USA Florida Middle Sambo Reef off Boca Chica, Monroe County Spearfishing Parker Simpson M NA Laceration to shin N NA 8’ shark Nine News, 6/7/2017 2017.06.04-Simpson.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2017.06.04-Simpson.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2017.06.04-Simpson.pdf 6091
2017.05.30 30-May-17 30 May 17 2017 Provoked USA South Carolina Awendaw, Charleston County Touching a shark Mackenzie Higgins F 20 Right hand bitten by hooked shark PROVOKED INCIDENT N NA 3’ shark C. Creswell, GSAF 2017.05.30-Higgins.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2017.05.30-Higgins.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2017.05.30-Higgins.pdf 6089
2017.05.28 28-May-17 28 May 17 2017 Unprovoked USA Florida Off Jupiter Feeding sharks Randy Jordan M NA Lacerations to right arm N Morning Tiger shark M. Michaelson, GSAF 2017.05.28-Jordan.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2017.05.28-Jordan.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2017.05.28-Jordan.pdf 6088
2017.05.03 03-May-17 03 May 17 2017 Invalid USA California Sunset Beach, Orange County Surfing Sophia Raab F 18 Laceration to thigh, likely caused by surfboard fin N 14h30 Shark involvement highly doubtful R. Collier, GSAF 2017.05.03-Raab.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2017.05.03-Raab.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2017.05.03-Raab.pdf 6083
2017.04.29.d 29-Apr-17 29 Apr 17 2017 Unprovoked USA California San Onofre, San Diego County NA Leeanne Ericson F NA Major injury to posterior thigh N 17h34 NA R. Collier, GSAF 2017.04.29.d-Ericson.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2017.04.29.d-Ericson.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2017.04.29.d-Ericson.pdf 6082
2017.04.29.b 29-Apr-17 29 Apr 17 2017 Unprovoked USA South Carolina Folly Beach, Charleston County Surfing Holly Dyar F 33 Left foot bitten N 11h00 NA C. Creswell, GSAF 2017.04.29.b-Dyar.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2017.04.29.b-Dyar.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2017.04.29.b-Dyar.pdf 6080
2017.04.26 26-Apr-17 26 Apr 17 2017 Invalid USA Florida NA Photo shoot Molly Cavelli F NA Alleged laceration to left ankle NA NA No shark invovlement - it ws a publicity stunt The Sun, 5/6/2017 2017.05.26-Cavelli.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2017.05.26-Cavelli.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2017.05.26-Cavelli.pdf 6077
2017.04.20 20-Apr-17 20 Apr 17 2017 Invalid USA South Carolina Georgetown County Swimming male M NA Laceration & puncture wounds to left foot N 08h50 Shark involvement not confirmed C. Creswell, GSAF 2017.04.20-PawleysIsland.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2017.04.20-PawleysIsland.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2017.04.20-PawleysIsland.pdf 6075
2017.04.17.b 17-Apr-17 17 Apr 17 2017 Unprovoked USA Florida Daytona Beach, Volusia County NA NA NA NA Minor bite to the foot N Afternoon NA Daytona Beach News-Journal, 4/17/2017 2017.04.17.b-Volusia.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2017.04.17.b-Volusia.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2017.04.17.b-Volusia.pdf 6074
2017.04.14 14-Apr-17 14 Apr 17 2017 Unprovoked USA Hawaii Kekaha Beach, Kauai Surfing Baboo M 28 Lower right leg severely injured N 09h00 Tiger shark, 12’ Hawaii News Now, 4/14/2017 2017.04.14-Baboo.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2017.04.14-Baboo.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2017.04.14-Baboo.pdf 6072
2017.04.13 13-Apr-17 13 Apr 17 2017 Unprovoked USA Florida Hanna Park, Jacksonville, Duval County Surfing Keeanan Perry M 17 Lacerations to right foot N 13h30 NA News4Jax, 4/14/2017 2017.04.13-Perry.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2017.04.13-Perry.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2017.04.13-Perry.pdf 6071
2017.04.12.b 12-Apr-17 12 Apr 17 2017 Unprovoked USA Florida St. Augustine Surfing Kerry Keyton F NA Lacerations to right foot N 13h45 NA The Surf Station, 4/13/2017 2017.04.12.b-Keyton.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2017.04.12.b-Keyton.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2017.04.12.b-Keyton.pdf 6070
2017.04.11 11-Apr-17 11 Apr 17 2017 Unprovoked USA Florida Ormond Beach, Volusia County Surfing Denise Holz-Oosterveld F 35 Calf bitten N 16h00 NA WFTV, 4/11/2017 2017.04.11-Oosterveld.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2017.04.11-Oosterveld.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2017.04.11-Oosterveld.pdf 6068
2017.04.10.b 10-Apr-17 10 Apr 17 2017 Unprovoked USA Florida Melbourne Beach, Brevard County Paddle boarding female F 10 Laceration to calf N 17h45 NA Brevard Times, 4/10/2017 2017.04.10.b-Melbourne.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2017.04.10.b-Melbourne.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2017.04.10.b-Melbourne.pdf 6067
2017.04.10.a 10-Apr-17 10 Apr 17 2017 Unprovoked USA Florida Melbourne Beach, Brevard County Swimming Heather Orr F 21 Minor injury to hand N 17h00 NA Brevard Times, 4/10/2017 2017.04.10.a-Orr.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2017.04.10.a-Orr.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2017.04.10.a-Orr.pdf 6066
2017.04.06 06-Apr-17 06 Apr 17 2017 Unprovoked USA Florida Daytona, Volusia County Swimming Kody Stephens M 16 Foot injured N NA NA Bradenton Herald, 4/7/2017 2017.04.07-Stephens.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2017.04.07-Stephens.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2017.04.07-Stephens.pdf 6065
2017.04.05 05-Apr-17 05 Apr 17 2017 Unprovoked USA Florida New Smyrna Beach, Volusia County Swimming Melanie Lawson F 51 Thigh nipped, minor injury N 13h00 NA AJC, 4/5/2017 2017.04.05-NSB.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2017.04.05-NSB.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2017.04.05-NSB.pdf 6064
2017.04.02.a 02-Apr-17 02 Apr 17 2017 Unprovoked USA Florida Destin, Okaloosa County Swimming Caitlyn Taylor F 17 Abrasions to lower left leg & puncture wounds to right leg N 15h00 5’ shark Okaloosa County Sheriff’s Office 2017.04.02.a-Taylor.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2017.04.02.a-Taylor.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2017.04.02.a-Taylor.pdf 6062
2017.03.27 27-Mar-17 27 Mar 17 2017 Unprovoked USA Florida New Smyrna Beach, Volusia County Surfing Robert Nesbit M 58 Minor injury to left foot N 10h00 NA Orlando Sentinel, 3/27/2017 2017.03.27-NSB.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2017.03.27-NSB.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2017.03.27-NSB.pdf 6061
2017.03.18 18-Mar-17 18 Mar 17 2017 Unprovoked USA California Monterey Bay Kayaking Brian Correira M NA No injury, kayak bitten N 14h30 White shark R. Collier, GSAF 2017.03.18-Correira.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2017.03.18-Correira.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2017.03.18-Correira.pdf 6059
2017.02.11 11-Feb-17 11 Feb 17 2017 Unprovoked USA Florida Melbourne Beach, Brevard County Swimming male M 22 Injury to hand N 13h30 NA Florida Today, 2/11/2017 2017.02.11-MelbourneBeach.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2017.02.11-MelbourneBeach.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2017.02.11-MelbourneBeach.pdf 6053
2017.02.01.b 01-Feb-17 01 Feb 17 2017 Boat USA South Carolina 16 miles off Hilton Head Tagging sharks Chip Michelove & crew NA NA Shark bit boat, no injury to occupants N NA White shark, female, 14’ YouTube, 2/2/2017 2017.02.01.b-Boat.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2017.02.01.b-Boat.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2017.02.01.b-Boat.pdf 6050
2017.01.22 22-Jan-17 22 Jan 17 2017 Unprovoked USA Florida Vero Beach, Indian River County NA male M Teen Puncture wounds to lower arm or hand N 14h00 NA TCPalm, 1/22/2017 2017.01.22-VeroBeach.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2017.01.22-VeroBeach.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2017.01.22-VeroBeach.pdf 6048
2017.01.13.a 13-Jan-17 13 Jan 17 2017 Unprovoked USA Florida Jensen Beach NA a lifeguard M NA Minor injury to hand N Morning NA WPTV. 1/13/2017 2017.01.13.a-JensenBeach.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2017.01.13.a-JensenBeach.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2017.01.13.a-JensenBeach.pdf 6045
2017.01.05 05-Jan-17 05 Jan 17 2017 Unprovoked USA Florida Blockhouse Beach, Brevard County Wading male M 47 Minor injuries to foot N 12h30 NA Florida Today, 1/5/2017 2017.01.05-Brevard.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2017.01.05-Brevard.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2017.01.05-Brevard.pdf 6042
2016.12.27 27-Dec-16 27 Dec 16 2016 Unprovoked USA Florida Avalon State Park Beach, North Hutchinson Island, St Lucie County Surfing Zack Davis M 16 Lacerations to right forearm N 17h00 NA TCPalm, 12/28/2016 2016.12.27-Davis.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.12.27-Davis.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.12.27-Davis.pdf 6040
2016.12.11 11-Dec-16 11 Dec 16 2016 Invalid USA Florida New Smyrna Beach, Volusia County Surfing Shane Garthwait M 19 Cuts to right ankle & foot N Afternoon NA News Journal, 12/15/2016 2016.12.11-Garthwait.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.12.11-Garthwait.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.12.11-Garthwait.pdf 6036
2016.11.14 14-Nov-16 14 Nov 16 2016 Unprovoked USA Hawaii Kamaole Beach Park I, Maui Floating Barbara Zawacki F 58 Injuries to right calf and thigh N 10h30 Tiger shark Maui News, 11/14/2016 2016.11.14-Zawacki.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.11.14-Zawacki.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.11.14-Zawacki.pdf 6031
2016.10.30 30-Oct-16 30 Oct 16 2016 Unprovoked USA Florida Mayport Naval Station Beach, Duval County Surfing Daniel Adams M 41 Lacerations to right foot and ankle N Afternoon NA First Coast News, 10/31/2016 2016.10.30-Adams.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.10.30-Adams.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.10.30-Adams.pdf 6030
2016.10.29 29-Oct-16 29 Oct 16 2016 Unprovoked USA Florida Mayport Naval Station Duval County Surfing Derrick Shoup M 42 Lacerations to right hand N Afternoon NA First Coast News, 10/31/2016 2016.10.29-Shoup.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.10.29-Shoup.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.10.29-Shoup.pdf 6029
2016.10.21 21-Oct-16 21 Oct 16 2016 Unprovoked USA Hawaii Hooipa Beach Park, Maui Surfing Federico Jaime M 36 Left arm and leg injured N 17h00 6’ to 8’ shark Surfline, 10/23/2016 2016.10.21-Hookipa.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.10.21-Hookipa.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.10.21-Hookipa.pdf 6027
2016.10.14 14-Oct-16 14 Oct 16 2016 Unprovoked USA Hawaii Charlie Young Beach, Kihei, Maui Snorkeling female F 66 Injuries to left calf N 09h50 NA Maui News 2016.10.14-Maui.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.10.14-Maui.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.10.14-Maui.pdf 6025
2016.10.10 10-Oct-16 10 Oct 16 2016 Unprovoked USA Oregon Indian Beach, Ecola State Park, Clatsop County Surfing Joseph Tanner M 29 Wounds to upper thigh and lower leg N 16h00 NA UP Beacon, 10/12/2016 2016.10.10-Tanner.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.10.10-Tanner.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.10.10-Tanner.pdf 6022
2016.10.02 02-Oct-16 02 Oct 16 2016 Unprovoked USA Florida New Smyrna Beach, Volusia County Surfing male M 21 Cuts to dorsal surface of left foot N 11h30 NA Orlando Sentinel, 10/2/2016 2016.10.02-NSB.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.10.02-NSB.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.10.02-NSB.pdf 6021
2016.10.01 01-Oct-16 01 Oct 16 2016 Unprovoked USA Florida New Smyrna Beach, Volusia County Surfng male M 32 Minor injuries N 17h30 NA Orlando Sentinel, 10/2/2016 2016.10.01-NSB.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.10.01-NSB.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.10.01-NSB.pdf 6020
2016.09.18.c 18-Sep-16 18 Sep 16 2016 Unprovoked USA Florida New Smyrna Beach, Volusia County Surfing male M 16 Minor injury to thigh N 13h00 NA Orlando Sentinel, 9/19/2016 2016.09.18.c-NSB.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.09.18.c-NSB.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.09.18.c-NSB.pdf 6018
2016.09.18.b 18-Sep-16 18 Sep 16 2016 Unprovoked USA Florida New Smyrna Beach, Volusia County Surfing Chucky Luciano M 36 Lacerations to hands N 11h00 NA Orlando Sentinel, 9/19/2016 2016.09.18.b-Luciano.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.09.18.b-Luciano.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.09.18.b-Luciano.pdf 6017
2016.09.18.a 18-Sep-16 18 Sep 16 2016 Unprovoked USA Florida New Smyrna Beach, Volusia County Surfing male M 43 Lacerations to lower leg N 10h43 NA Orlando Sentinel, 9/19/2016 2016.09.18.a-NSB.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.09.18.a-NSB.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.09.18.a-NSB.pdf 6016
2016.09.17.b 17-Sep-16 17 Sep 16 2016 Unprovoked USA California Bunkers, Humboldt Bay, Eureka, Humboldt County Surfing Yuma M 43 No injury, board bitten N After noon NA R. Collier, GSAF 2016.09.17.b-Yuma.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.09.17.b-Yuma.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.09.17.b-Yuma.pdf 6015
2016.09.11 11-Sep-16 11 Sep 16 2016 Unprovoked USA Florida Ponte Vedra, St. Johns County Wading male M 60s Minor injury to arm N 15h15 3’ to 4’ shark News4Jax, 9/11/2016 2016.09.11-PonteVedra.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.09.11-PonteVedra.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.09.11-PonteVedra.pdf 6011
2016.09.07 07-Sep-16 07 Sep 16 2016 Unprovoked USA Hawaii Makaha, Oahu Swimming Lulu Bagnol F 51 Severe lacerations to shoulder & forearm N 14h30 Tiger shark, 10’ Hawaii News Now, 9/7/2016 2016.09.07-Bagnol.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.09.07-Bagnol.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.09.07-Bagnol.pdf 6010
2016.09.05.b 05-Sep-16 05 Sep 16 2016 Unprovoked USA South Carolina Kingston Plantation, Myrtle Beach, Horry County Boogie boarding Rylie Williams F 12 Lacerations & punctures to lower right leg N Late afternoon NA C. Creswell, GSAF 2016.09.05.b-Williams.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.09.05.b-Williams.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.09.05.b-Williams.pdf 6008
2016.09.04 04-Sep-16 04 Sep 16 2016 Unprovoked USA Florida New Smyrna Beach, Volusia County Body boarding Austin Moore M 9 Foot bitten N NA NA Orlando Sentinel, 9/7/2016 2016.09.04-Moore.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.09.04-Moore.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.09.04-Moore.pdf 6006
2016.09.01 01-Sep-16 01 Sep 16 2016 Unprovoked USA California Refugio State Beach, Santa Barbara County Spearfishing Tyler McQuillen M 22 Two toes broken & lacerated N NA White shark, 8’ to 10’ R. Collier, GSAF 2016.09.01-McQuillen.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.09.01-McQuillen.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.09.01-McQuillen.pdf 6005
2016.08.29.b 29-Aug-16 29 Aug 16 2016 Unprovoked USA Florida New Smyrna Beach, Volusia County Surfing Sam Cumiskey M 25 Lacerations to right foot N 15h00 Bull shark, 6’ News Channel 8, 8/30/16 2016.08.29.b-Cumiskey.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.08.29.b-Cumiskey.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.08.29.b-Cumiskey.pdf 6003
2016.08.29.a 29-Aug-16 29 Aug 16 2016 Unprovoked USA Florida New Smyrna Beach, Volusia County Surfing male M 37 Minor injury to ankle N 14h00 NA News Channel 8, 8/30/16 2016.08.29.a-NSB.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.08.29.a-NSB.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.08.29.a-NSB.pdf 6002
2016.08.25 25-Aug-16 25 Aug 16 2016 Unprovoked USA Florida Ponte Vedra, St. Johns County Wading David Cassetty M 49 Minor injury to ankle N 16h00 NA First Coast News, 7/25/2016 2016.08.25-Cassetty.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.08.25-Cassetty.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.08.25-Cassetty.pdf 6000
2016.08.06 06-Aug-16 06 Aug 16 2016 Unprovoked USA Hawaii Maui SUP Foil boarding Connor Baxter M 21 No inury, shark & board collided N 16h30 Tiger shark, 10’ SUP, 8/9/2015 2016.08.06-Baxter.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.08.06-Baxter.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.08.06-Baxter.pdf 5998
2016.08.04 04-Aug-16 04 Aug 16 2016 Unprovoked USA Florida New Smyrna Beach, Volusia County Surfing Nolan Tyler M 22 Big toe bitten N NA Blacktip shark News 965, 8/5/2016 2016.06.04-Tyler.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.06.04-Tyler.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.06.04-Tyler.pdf 5997
2016.07.27 27-Jul-16 27 Jul 16 2016 Provoked USA Florida Florida Keys, Monroe County Lobstering Warren Sapp M 43 Laceration to left forearm PROVOKED INCIDENT N NA Nurse shark, 4’ Tampa Bay Times, 7/27/2016 2016.07.27-Sapp.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.07.27-Sapp.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.07.27-Sapp.pdf 5993
2016.07.17 17-Jul-16 17 Jul 16 2016 Boat USA Alabama 8 miles off Mobile Fishing in Alabama Deep Fishing Rodeo Occupant: Ben Raines NA NA No injury, shark bit trolling motor N NA Tiger shark, 10’ Al.com, 7/19/2016 2016.07.17-Gulf.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.07.17-Gulf.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.07.17-Gulf.pdf 5987
2016.07.16.b 16-Jul-16 16 Jul 16 2016 Unprovoked USA Florida New Smyrna Beach, Volusia County Surfing female F 9 Minor injury to leg N 1300 NA Orlando Sentinel, 7/21/2016 2016.07.16.b-NSB.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.07.16.b-NSB.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.07.16.b-NSB.pdf 5986
2016.07.16.a 16-Jul-16 16 Jul 16 2016 Unprovoked USA Florida New Smyrna Beach, Volusia County NA female F 11 Minor injury to toes N 11h00 NA Orlando Sentinel, 7/21/2016 2016.07.16.a-NSB.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.07.16.a-NSB.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.07.16.a-NSB.pdf 5985
2016.07.15.b 15-Jul-16 15 Jul 16 2016 Unprovoked USA California Surfside, Orange County Kite surfing Lee Frees M 61 No injury, board damaged N 17h00 White shark, 10’ to 12’ R. Collier, GSAF 2016.07.15.b-Frees.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.07.15.b-Frees.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.07.15.b-Frees.pdf 5984
2016.07.15.a 15-Jul-16 15 Jul 16 2016 Unprovoked USA South Carolina North Myrtle Beach, Horry County Swimming male M NA Puncture wounds to foot N NA NA C. Creswell, GSAF 2016.07.15.a-MyrtleBeach.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.07.15.a-MyrtleBeach.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.07.15.a-MyrtleBeach.pdf 5983
2016.07.08 08-Jul-16 08 Jul 16 2016 Boat USA California Capitola, Santa Cruz County Fishing for squid Mark Davis M NA No injury. Hull bitten, tooth fragment recovered N NA White shark R. Collier, GSAF 2016.07.08-CapitolaBoat.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.07.08-CapitolaBoat.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.07.08-CapitolaBoat.pdf 5980
2016.07.07.b 07-Jul-16 07 Jul 16 2016 Provoked USA Massachusetts Off Gloucester, Essec County Fishing Roger Brissom M 59 Fin of hooked shark injured fisherman’s forearm. . PROVOKED INCIDENT N 10h00 dogfish shark Salem News 7/8/2016 2016.07.07.b-Brissom.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.07.07.b-Brissom.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.07.07.b-Brissom.pdf 5979
2016.07.07.a 07-Jul-16 07 Jul 16 2016 Boat USA California Off Palos Verdes peninsula, Los Angeles County Fishing for sharks 24’ boat Shark Tagger Occupant Keith Poe M NA No injury. Hull bitten, tooth fragment recovered N NA White shark R. Collier, GSAF 2016.07.07.a-PoeBoat.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.07.07.a-PoeBoat.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.07.07.a-PoeBoat.pdf 5978
2016.07.06 06-Jul-16 06 Jul 16 2016 Unprovoked USA Florida Melbourne Beach, Brevard County Swimming female F 42 Buttocks, thigh, left hand & wrist injured N 14h30 / 15h30 NA Florida Today, 7/6/2016 2016.07.06-Njwoman.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.07.06-Njwoman.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.07.06-Njwoman.pdf 5977
2016.06.27 27-Jun-16 27 Jun 16 2016 Unprovoked USA South Carolina Sullivan’s Island NA male M 35 Minor injury N 16h20 3’ to 4’ shark C. Creswell, GSAF 2016.06.27-Sullivans.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.06.27-Sullivans.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.06.27-Sullivans.pdf 5975
2016.06.25 25-Jun-16 25 Jun 16 2016 Unprovoked USA North Carolina Atlantic Beach, Emerald Isle, Carteret County Surfing male M 11 Foot injured N 14h34 NA C. Creswell, GSAF 2016.06.25-AtlanticBeach.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.06.25-AtlanticBeach.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.06.25-AtlanticBeach.pdf 5974
2016.06.21.b 21-Jun-16 21 Jun 16 2016 Unprovoked USA South Carolina North Myrtle Beach, Horry County Floating Jeff Schott M 42 Lacerations and punctures to foot N 15h25 3’ to 5’ shark C. Creswell, GSAF 2016.06.21.b-Schott.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.06.21.b-Schott.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.06.21.b-Schott.pdf 5971
2016.06.21.a 21-Jun-16 21 Jun 16 2016 Unprovoked USA Florida Pelican Beach Park, Satellite Beach, Brevard County Wading male M NA Injuries to right calf N 14h55 NA Florida Today, 6/22/2016 2016.06.21.a-SatelliteBeach.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.06.21.a-SatelliteBeach.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.06.21.a-SatelliteBeach.pdf 5970
2016.06.15.b 15-Jun-16 15 Jun 16 2016 Unprovoked USA Hawaii Kalapaki Beach, Kauai Surfing male M NA Single puncture wound to arm N 06h00 3’ to 4’ shark West Hawaii Today, 6/16/2016 2016.06.15.b-Kauai.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.06.15.b-Kauai.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.06.15.b-Kauai.pdf 5969
2016.06.14 14-Jun-16 14 Jun 16 2016 Unprovoked USA Texas Pirates Beach, Galveston Floating in tube Marin Alice Melton F 6 Injury to lower leg N 17h30 3’ to 4’ shark Click2Houston, 6/14/2016 2016.06.14-Melton.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.06.14-Melton.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.06.14-Melton.pdf 5967
2016.06.11 11-Jun-16 11 Jun 16 2016 Unprovoked USA North Carolina Atlantic Beach, Emerald Isle, Carteret County Standing Dillon Bowen M 19 Laceration to wrist N 15h00 3’ shark C. Creswell, GSAF 2016.06.11-Bowen.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.06.11-Bowen.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.06.11-Bowen.pdf 5966
2016.06.07 07-Jun-16 07 Jun 16 2016 Invalid USA South Carolina Folly Beach, Charleston County Surfing Jack O’Neill M 27 No injury, board damaged N 11h30 Said to involve an 8’ shark but more likely damage caused by debris C. Creswell, GSAF 2016.06.07-Oneill.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.06.07-Oneill.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.06.07-Oneill.pdf 5965
2016.06.05.b 05-Jun-16 05 Jun 16 2016 Unprovoked USA Florida Flagler Beach, Flagler County Swimming male M 64 Leg bitten N 08h30 NA Daytona Beach News-Journal, 6/5/2016 2016.06.05.b-Flagler.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.06.05.b-Flagler.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.06.05.b-Flagler.pdf 5964
2016.05.29.b 29-May-16 29 May 16 2016 Unprovoked USA California Corona Del Mar, Newport, Orange County Swimming Maria Korcsmaros F 52 Injuries to arm and shoulder N 16h00 NA R. Collier, GSAF 2016.05.29.b-Corona.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.05.29.b-Corona.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.05.29.b-Corona.pdf 5958
2016.05.29.a 29-May-16 29 May 16 2016 Unprovoked USA Florida Neptune, Duval County Swimming male M 13 Injury to posterior right leg N 15h45 5’ shark News4Jax, 5/29/2016 2016.05.29.a-Neptune.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.05.29.a-Neptune.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.05.29.a-Neptune.pdf 5957
2016.05.22 22-May-16 22 May 16 2016 Unprovoked USA Florida Vero Beach, Indian River County Swimming Mary Marcus F 57 Puncture wounds to thigh N 12h00 NA Florida Today, 5/22/2016 2016.05.22-Marcus.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.05.22-Marcus.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.05.22-Marcus.pdf 5956
2016.05.21.b 21-May-16 21 May 16 2016 Unprovoked USA Florida St. Petersburg, Pinellas County Swimming Krystal Magee F 22 Lacerations and puncture wounds to foot and ankle N 18h00 Bull shark, 4’ to 5’ ABC Action News, 6/15/2016 2016.05.21.b-Magee.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.05.21.b-Magee.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.05.21.b-Magee.pdf 5955
2016.05.21.a 21-May-16 21 May 16 2016 Unprovoked USA Florida Hugenot Beach , Jacksonville, Duval County Swimming female F 11 Back, arm & hand injured N 17h46 NA Action News Jax, 5/23/2016 2016.05.21.a-Girl.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.05.21.a-Girl.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/http://sharkattackfile.net/spreadsheets/pdf_directory/2016.05.21.a-Girl.pdf 5954
2016.05.18 18-May-16 18 May 16 2016 Unprovoked USA Florida Ponte Vedra, St. Johns County Swimming Mark Wilson M 48 Ankle bitten N Morning Blacktip shark, 4’ News4Jax, 5/19/2016 2016.05.18-Wilson.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.05.18-Wilson.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.05.18-Wilson.pdf 5953
2016.05.15 15-May-16 15 May 16 2016 Provoked USA Florida Boca Raton, Palm Beach County Teasing a shark female F 23 Arm grabbed PROVOKED INCIDENT N 13h20 Nurse shark, 2’ CBS News, 5/16/2016 2016.05.15-Boca.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.05.15-Boca.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.05.15-Boca.pdf 5952
2016.05.03 03-May-16 03 May 16 2016 Unprovoked USA Hawaii Wailea Beach, Maui Floating male M 59 Minor lacerations to right shoulder N 15h49 NA Maui Now, 5/3/2016 2016.05.03-Maui.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.05.03-Maui.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.05.03-Maui.pdf 5951
2016.04.23 23-Apr-16 23 Apr 16 2016 Unprovoked USA Florida New Smyrna Beach, Volusia County Surfing Kelton Beardall M 15 Minor injury to left foot N 17h30 NA News4Jax, 4/23/2016 2016.04.23-Beardall.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.04.23-Beardall.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.04.23-Beardall.pdf 5948
2016.04.13 13-Apr-16 13 Apr 16 2016 Unprovoked USA Florida Off Singer Island, Palm Beach County Spearfishing Kyle Senkowicz M 26 Multiple bites to right arm N NA Bull shark, 7’ Palm Beach Post, 4/13/2016 2016.04.13-Senkowicz.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.04.13-Senkowicz.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.04.13-Senkowicz.pdf 5944
2016.04.07.b 07-Apr-16 07 Apr 16 2016 Unprovoked USA Florida Florida Keys, Monroe County Fishing Jonathan Lester M 34 Left hand bitten N NA 5’ to 6’ shark NA 2016.04.07.b-Lester.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.04.07.b-Lester.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.04.07.b-Lester.pdf 5941
2016.04.07.a 07-Apr-16 07 Apr 16 2016 Invalid USA Florida Corners Beach, Jupiter, Palm Beach County SUP Maximo Trinidad M NA Fell off board when spinner shark leapt from the water next to him. No injury to surfer N NA NA YouTube 2016.04.07.a-Trinidad.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.04.07.a-Trinidad.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.04.07.a-Trinidad.pdf 5940
2016.03.31 31-Mar-16 31 Mar 16 2016 Unprovoked USA Hawaii Olowalu, Maui Snorkeling J. Orr F 46 Minor injury to left foot N 11h00 NA Maui Now, 3/31/2016 2016.03.31-Orr.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.03.31-Orr.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.03.31-Orr.pdf 5939
2016.03.28.b 28-Mar-16 28 Mar 16 2016 Unprovoked USA Florida Fort Myers Beach, Lee County NA Nick Kawa M Teen Minor injury to arm. Possibly caused by smalll nurse shark N NA Shark involvement not confirmed Fox 35, 3/30/2015 2016.03.28.b-Kawa.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.03.28.b-Kawa.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.03.28.b-Kawa.pdf 5937
2016.03.13 13-Mar-16 13 Mar 16 2016 Invalid USA California Bolsa Chica State Park, Orange County Surfing unknown NA NA Board reportedly bumped by shark. No injury N Morning Shark involvement not confirmed Orange County Register, 3/13/2016 2016.03.13-BolsaChicaSurfer.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.03.13-BolsaChicaSurfer.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.03.13-BolsaChicaSurfer.pdf 5934
2016.03.11 11-Mar-16 11 Mar 16 2016 Unprovoked USA Florida Vero Beach, St. Lucie County Body surfing Daniel Kenny M 19 Lacerations to right foot and ankle N 13h30 NA WCBV-5, 3/31/206 2016.03.11-Kenny.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.03.11-Kenny.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.03.11-Kenny.pdf 5933
2016.03.04 04-Mar-16 04 Mar 16 2016 Unprovoked USA Florida Ocean Reef Park, Singer Island, Palm Beach County NA male M 12 Superficial injury to foot N Afternoon NA WPTV. 3/4/2016 2016.03.04-OCPark.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.03.04-OCPark.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.03.04-OCPark.pdf 5931
2016.01.28 28-Jan-16 28 Jan 16 2016 Unprovoked USA Hawaii Hanalei Bay, Kauai Surfing male M NA Lacerations to both hands N 14h00 Reef shark, 5’ KHON2. 1/28/2016 2016.01.28-Kauai.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.01.28-Kauai.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.01.28-Kauai.pdf 5920
2016.01.25 25-Jan-16 25 Jan 16 2016 Unprovoked USA Hawaii Hanalei Bay, Kauai, Surfing Kaya Waldman F 15 No injury N 11h30 NA The Garden Island, 2/2/2016 2016.01.25-Waldman.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.01.25-Waldman.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.01.25-Waldman.pdf 5919
2016.01.24.b 24-Jan-16 24 Jan 16 2016 Unprovoked USA Texas Off Surfside Spearfishing Keith Love M NA Bruised ribs & tail bone, speargun broken and wetsuit cut N 09h30 / 10h00 Bull sharks x 2 K. Love 2016.01.24.b-Love.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.01.24.b-Love.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.01.24.b-Love.pdf 5918
2016.01.23 23-Jan-16 23 Jan 16 2016 Unprovoked USA Hawaii Wailea Beach, Maui Paddle boarding Matt Mason M 48 No injury N Morning Tiger shark, 14’ Grand Forks Herald, 1/27/2915 2016.01.23-Mason.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.01.23-Mason.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2016.01.23-Mason.pdf 5916
2015.12.22 22-Dec-15 22 Dec 15 2015 Unprovoked USA Hawaii La’aloa Beach Park Paddle boarding Robert Ford M 71 No injury, shark bit board N Morning 9’ shark West Hawaii Today, 12/23/2015 2015.12.22-Ford.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2015.12.22-Ford.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2015.12.22-Ford.pdf 5910
2015.11.15.b 15-Nov-15 15 Nov 15 2015 Unprovoked USA Florida Palm Beach, Palm Beach County Swimming Sarah Rose Bogden F NA Leg injured N 11h00 NA Palm Beach Post, 11/16/2015 2015.11.15.b-Bogden.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2015.11.15.b-Bogden.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2015.11.15.b-Bogden.pdf 5902
2015.11.15.a 15-Nov-15 15 Nov 15 2015 Unprovoked USA Florida Ocean Reef Park, Singer Island, Palm Beach County Surfing Allen Engelman M 28 Lacerations to hand N NA Spinner shark, 7’ 5WPTV, 11/15/2015 2015.11.15.a-Engelman.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2015.11.15.a-Engelman.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/http://sharkattackfile.net/spreadsheets/pdf_directory/2015.11.15.a-Engelman.pdf 5901
2015.12.23 07-Nov-15 07 Nov 15 2015 Invalid USA Florida Paradise Beach, Melbourne, Brevard County Surfing Ryla Underwood F 9 Lower left leg injured N 11h00 Shark involvement not confirmed Fox25Orlando, 11/7/2015 2015.11.07-Underwood.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2015.11.07-Underwood.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2015.11.07-Underwood.pdf 5899
2015.11.03 03-Nov-15 03 Nov 15 2015 Unprovoked USA Hawaii Kehena Beach, Hawaii Swimming Paul O’Leary M 54 Laceration to right ankle N 11h00 NA Hawaii News Now, 11/4/2015 2015.11.03-O’Leary.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2015.11.03-O'Leary.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2015.11.03-O'Leary.pdf 5898
2015.11.01.b 01-Nov-15 01 Nov 15 2015 Unprovoked USA Florida Cocoa Beach, Brevard County Wading Jill Kruse F 28 Injury to right ankle/calf & hand N 14h00 3’ to 5’ shark USA Today, 11/1/2015 2015.11.01.b-Kruse.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2015.11.01.b-Kruse.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2015.11.01.b-Kruse.pdf 5897
2015.10.28.a 28-Oct-15 28 Oct 15 2015 Unprovoked USA Hawaii Malaka, Oahu Body boarding Raymond Senensi M 10 Lacerations & puncture wounds to right thigh, calf & ankle N 14h50 NA Star Advertiser, 10/28/2015 2015.10.28-Senensi.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2015.10.28-Senensi.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2015.10.28-Senensi.pdf 5894
2015.10.21 21-Oct-15 21 Oct 15 2015 Unprovoked USA Florida Playalinda Beach, Brevard County Surfing Michael Salinger M 21 Lacerations to left hand N 14h30 5’ shark ClickOrlando.com, 10/21/2015 2015.10.21-Playalinda.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2015.10.21-Playalinda.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2015.10.21-Playalinda.pdf 5892
2015.10.19 19-Oct-15 19 Oct 15 2015 Unprovoked USA Florida Deerfield Beach, Broward County Surfing Peter Kirn M 21 Left foot bitten N 13h50 Spinner shark, 5’ NBC6.com, 10/19/2015 2015.10.19-Kirn.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2015.10.19-Kirn.pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2015.10.19-Kirn.pdf 5891
2015.10.17.b 17-Oct-15 17 Oct 15 2015 Invalid USA Hawaii Waikiki, Surfing male M 32 Left foot bitten by eel N 19h20 No shark involvement KHON2, 10/17/2015 2015.10.17.b.-Hawaii. pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2015.10.17.b.-Hawaii. pdf http://sharkattackfile.net/spreadsheets/pdf_directory/2015.10.17.b.-Hawaii. pdf 5889

Raise the Sales!

Let’s get some Ice Cream Sales data from the Federal Reserve! A.K.A. IPN31152N:

  • Industrial Production
    • Manufacturing
      • Non-Durable Goods
        • Ice Cream and Frozen Dessert (NAICS = 31152)

Dates

  • It’s much nicer when the dates are uniformly formatted…
    • The lubridate library is representative of standard manipulation functionality
# https://fred.stlouisfed.org/series/IPN31152N
i_scream <- read_csv("IPN31152N.csv")

# https://lubridate.tidyverse.org
# https://stackoverflow.com/questions/33221425/how-do-i-group-my-date-variable-into-month-year-in-r
i_scream <- i_scream %>% 
  mutate(month = lubridate::month(DATE, label = TRUE))

I Scream \(\implies\) Shark Attacks

Here are the monthly totals of Shark Attacks, and the average Industrial Ice Cream Production (IPN31152N index).

Evidence?

  • tidy does standard group_by and inner_join table operations
Month Total Shark Attacks Average Ice Cream Sales
Apr 156 119.34282
Aug 319 125.16183
Dec 62 79.48116
Feb 57 102.87803
Jan 51 86.83899
Jul 343 130.87839
Jun 234 138.61432
Mar 96 114.67007
May 149 122.50903
Nov 116 85.92279
Oct 198 97.61893
Sep 284 110.95583
# , fig.height=7, fig.width=7, out.width="75%"}
# https://stackoverflow.com/questions/<below>
# 14942681/change-size-of-axes-title-and-labels-in-ggplot2
# 28243514/ggplot2-change-title-size
# 23527385/place-y-axis-on-the-right

ggplot(sharks_i_scream, 
       mapping=aes(x=`Average Ice Cream Sales`, 
                   y=`Total Shark Attacks`,
                   label=Month)) + 
  geom_smooth(method='lm') + geom_text() + 
  ggtitle('Sharks attacks are more prevalent when we eat more Ice Cream!') +
  annotate("text", x=90, y=370, size=6,
           label='Duh-Duh, Duh-Duh...', color='gray') +
  annotate("text", x=128, y=-20, size=6,
           label='Nom! Nom! Nom! Nom!', color='gray') +
  theme(axis.title=element_text(size=14),
        plot.title=element_text(size=15)) + 
  scale_y_continuous(position = "right")

Correlation is not Causation!

How would ice cream production mechanistically influence shark attacks?

  • What would be a better explanation?
    • Time of the year?
      • Mechanism for influencing ice cream eating behavior?
      • Mechanism for influencing beach attendance behavior?
library(gridExtra)

a_plot <- sharks_i_scream %>%
    ggplot(mapping=aes(x=`Average Ice Cream Sales`, 
                       y=`Total Shark Attacks`)) + 
    geom_point() + geom_smooth(method='lm') +
    coord_cartesian(ylim=c(50, 340)) + 
    theme(axis.title=element_text(size=12),
          plot.title=element_text(size=15))


#no_ticks <- theme(axis.text.x = element_blank(),
#                  axis.text.y = element_blank())

# http://www.sthda.com/english/articles/24-ggpubr-publication-ready-plots/81-ggplot2-easy-way-to-mix-multiple-graphs-on-the-same-page/
grid.arrange(grobs = list(a_plot + geom_text(aes(label=Month),
                                             vjust="inward", hjust="inward") + 
                                   ggtitle("(A) Counfounding"),
                          a_plot + labs(x='Coffeine Consumption', 
                                        y='Productivity') + 
                                   ggtitle("(B) Causality"),
                          a_plot + labs(x='Height', y='Weight') + 
                                   ggtitle("(C) Complexity"),
                          a_plot + labs(x='Dosage', y='Response') + 
                                   ggtitle("(D) Intervention")),
             nrow=2, aspect=TRUE)

Caffeine/Productivity

  • What is the direction of causality?
    • Are you working because you’re drinking coffee; or
      • are you drinking coffee because you’re working

Height/Weight

  • What other factors effect weight?
    • Suppose this data uses a population of statisticians:
      • how would the picture change in a population of children?

Treatment/Responce

  • Could this association mislead us?
    • What if the treatment is self-selected?
      • E.g., aspirin usage and inflammation

Observation VS Experimentation

# install.packages('latex2exp')
library(latex2exp)

a_plot <- sharks_i_scream %>%
    ggplot(mapping=aes(x=`Average Ice Cream Sales`, 
                       y=`Total Shark Attacks`)) + geom_point()

# http://www.sthda.com/english/articles/24-ggpubr-publication-ready-plots/81-ggplot2-easy-way-to-mix-multiple-graphs-on-the-same-page/
grid.arrange(a_plot + labs(x=latex2exp::TeX('Uncontrolled Observed Treatment $(X)$'), 
                           y=latex2exp::TeX('Observed Outcome $(Y)$')) + 
                      ggtitle("(E) Observational Study"),
             a_plot + labs(x=latex2exp::TeX('Controlled Treatment Assignment $(T)$'), 
                           y=latex2exp::TeX('Observed Outcome $(Y)$')) + 
                      ggtitle("(F) Controlled Experiment"),
             ncol=2, respect=TRUE)

Some Maths

  • \(Y\): Outcome [Inflammation Reduction]
  • \(X\): Intervention [Aspirin]
  • \(Z\): Confounder [Sprain Severity]

Is there something that can guess the Intervention?

\[\huge \textrm{ Is } f(X|Y) \textrm{ actually } f(X|Y,Z)?\]

  • \(T\): Unconfounded Intervention

\[\huge f(X|Y,Z) = f(X|Y) = f(T|Y)\]

Bonus: this concept is called propensity scores!

sharks_i_scream %>% rename(Treatment=`Average Ice Cream Sales`, 
                           Outcome=`Total Shark Attacks`) %>%
  mutate(Confounder=Treatment+Outcome) -> XYZ
  
# https://plotly.com/r/3d-subplots/
ax <- list(
  title = 'Confounder',
  zeroline = FALSE,
  showline = FALSE,
  showticklabels = FALSE,
  showgrid = FALSE
)

# https://community.plotly.com/t/droplines-from-points-in-3d-scatterplot/4113/11
XYZ_floor <- replicate(2, XYZ, simplify = F)
XYZ_floor[[2]]$Confounder <- 0
XYZ_floor <- XYZ_floor %>% bind_rows()

scene1 = list(camera=list(eye=list(x=-.000001, y=-.01, z=3)),
              zaxis=ax)

plot_ly(scene='scene1', showlegend=FALSE) %>%
        add_markers(data=XYZ, marker=list(color='black'),
                    x=~Treatment, y=~Outcome, z=~Confounder)  %>%
        add_paths(data=XYZ_floor, line=list(color='black'),
                  x=~Treatment, y=~Outcome, z=~Confounder) -> confounder

XYZ %>% mutate(Confounder=100+0*Confounder) -> XYZ2
XYZ2_floor <- replicate(2, XYZ2, simplify = F)
XYZ2_floor[[2]]$Confounder <- 0
XYZ2_floor <- XYZ2_floor %>% bind_rows()
XYZ %>% mutate(Confounder=50+0*Confounder) -> XYZ2

axx <- list(title = "Treatment", automargin = TRUE)
axy <- list(title = "Outcome", automargin = TRUE)
scene2 = list(camera=list(eye=list(x=-.000001, y=-.01, z=3)),
              zaxis=ax, xaxis=axx, yaxis=axy)

plot_ly(scene='scene2', showlegend=FALSE) %>%
        add_markers(data=XYZ2, marker=list(color='green'),
                    x=~Treatment, y=~Outcome, z=~Confounder)  %>%
        add_paths(data=XYZ2_floor, line=list(color='green'),
                  x=~Treatment, y=~Outcome, z=~Confounder) -> no_confounder

subplot(confounder, no_confounder) %>% 
  layout(scene=scene1, scene2=scene2, margin = list(pad = 10))

Randomization to the Rescue!

  • By what mechanism might you try to ensure that your Treatment Assignment is not a Confounded Treatment Assignment?
    • I.e., is not accidentally confounded with the Outcome?
XYZ %>% mutate(Confounder=100+0*Confounder) -> XYZ2
XYZ2 %>% add_column(size=.1, color='green') -> XYZ2
n <- 20
XYZ2_floor <- replicate(n, XYZ2, simplify = F)
for(i in 1:20){
  if(i<=10)
    XYZ2_floor[[i]]$Confounder <- (i-1)*100/n
  else
    XYZ2_floor[[i]]$Confounder <- i*100/n
}
XYZ2_floor <- XYZ2_floor %>% bind_rows()
XYZ %>% mutate(Confounder=50+0*Confounder) -> XYZ2
XYZ2 %>% add_column(size=10, color='blue') -> XYZ2

axx <- list(title = "Treatment", automargin = TRUE)
axy <- list(title = "Outcome", automargin = TRUE)
scene1 = list(camera=list(eye=list(x=-.000001, y=-.01, z=3)),
              zaxis=ax, xaxis=axx, yaxis=axy)

plot_ly(scene='scene1', showlegend=FALSE) %>%
        add_markers(data=XYZ2_floor, size=~size, sizes=c(10,25), 
                    marker=list(symbol='circle', sizemode='diameter', 
                                color='blue'),
                    x=~Treatment, y=~Outcome, z=~Confounder)  %>%
        add_markers(data=XYZ2, size=~size, sizes=c(10,25), 
                    marker=list(symbol='circle', sizemode='diameter',
                                color='green'),
                    x=~Treatment, y=~Outcome, z=~Confounder) -> uni

XYZ %>% mutate(Confounder=100+0*Confounder) -> XYZ2
XYZ2 %>% add_column(size=.1, color='green') -> XYZ2
n <- 20
XYZ2_floor <- replicate(n, XYZ2, simplify = F)
for(i in 1:20){
  if(i<=10)
    XYZ2_floor[[i]]$Confounder <- rnorm(12,50,20)
  else
    XYZ2_floor[[i]]$Confounder <- rnorm(12,50,20)
}
XYZ2_floor <- XYZ2_floor %>% bind_rows()
XYZ %>% mutate(Confounder=50+0*Confounder) -> XYZ2
XYZ2 %>% add_column(size=10, color='blue') -> XYZ2

axx <- list(title = "Treatment", automargin = TRUE)
axy <- list(title = "Outcome", automargin = TRUE)
scene2 = list(camera=list(eye=list(x=-.000001, y=-.01, z=3)),
              zaxis=ax, xaxis=axx, yaxis=axy)

plot_ly(scene='scene2', showlegend=FALSE) %>%
        add_markers(data=XYZ2_floor, size=~size, sizes=c(10,25), 
                    marker=list(symbol='circle', sizemode='diameter', 
                                color='blue'),
                    x=~Treatment, y=~Outcome, z=~Confounder)  %>%
        add_markers(data=XYZ2, size=~size, sizes=c(10,25), 
                    marker=list(symbol='circle', sizemode='diameter', 
                                color='green'),
                    x=~Treatment, y=~Outcome, z=~Confounder) -> rand

subplot(uni, rand) %>% 
  layout(scene=scene1, scene2=scene2, margin=list(pad=0))

One More Important Benefit: by randomly sampling from from the "population we also make our treatment groups representative of that population

The Horseshoe Prior

A presentation example for teaching Upper-Year Statistics Major students. Designed specifically for the interview process for the Department Statistical Science at the University of Toronto.

Regularization

\(L_2\) (Ridge)

  • Also known as Tikhonov regularization
  • Corresponds to an analyss with a Gaussian or Normal Prior

\[ \begin{align*} \hline \log \prod_{i=1}^n \frac{1}{\sqrt{2\sigma}}e^{-\frac{1}{2}\left(\frac{y_i-\mathbf{x}_i^\intercal \boldsymbol{\beta}}{ \Large \sigma}\right)^2} {} = & \sum_{i=1}^n \log \frac{1}{\sqrt{2 \sigma}} - \sum_{i=1}^n \frac{1}{2}\left(\frac{y_i-\mathbf{x}_i^\intercal \boldsymbol{\beta}}{\large \sigma}\right)^2 \\\\\hline {}\\ \log \left(\left( \prod_{i=1}^n \frac{1}{\sqrt{2\sigma}}e^{-\frac{}{2}\left(\frac{y_i-\mathbf{x}_i^\intercal \boldsymbol{\beta}}{\large \sigma}\right)^2} \right) (2\sigma_0)^{-p/2} e^{-\frac{(\boldsymbol{\beta}-\boldsymbol{\beta}_0)^\intercal(\boldsymbol{\beta}-\boldsymbol{\beta}_0)}{2\sigma_0^p}} \right) {} = & \sum_{i=1}^n \log \frac{1}{\sqrt{2\sigma}} - \sum_{i=1}^n \frac{1}{2}\left(\frac{y_i-\mathbf{x}_i^\intercal \boldsymbol{\beta}}{\large \sigma}\right)^2 + \\ & {} \log \left((2\sigma_0)^{-p/2}\right) - \frac{(\boldsymbol{\beta}-\boldsymbol{\beta}_0)^\intercal(\boldsymbol{\beta}-\boldsymbol{\beta}_0)}{2\sigma_0^p} \\\\\hline \\ \Large \textrm{Loss} + \textrm{Penalty} \quad\quad\quad \normalsize {} = & \sum_{i=1}^n \frac{1}{2}\left(y_i-\mathbf{x}_i^\intercal \boldsymbol{\beta}\right)^2 + \lambda \sum_{k=1}^{p} \beta_k^2 \\\hline \end{align*} \]

\(L_1\) (Lasso)

  • Corresponds to an analyss with a Double Exponential or Laplace Prior

\[ \begin{align*} \hline \log \prod_{i=1}^n \frac{1}{\sqrt{2\sigma}}e^{-\frac{1}{2}\left(\frac{y_i-\mathbf{x}_i^\intercal \boldsymbol{\beta}}{\Large \sigma}\right)^2} {} = & \sum_{i=1}^n \log \frac{1}{\sqrt{2 \sigma}} - \sum_{i=1}^n \frac{1}{2}\left(\frac{y_i-\mathbf{x}_i^\intercal \boldsymbol{\beta}}{\large \sigma}\right)^2 \\\\\hline {}\\ \log \left(\left( \prod_{i=1}^n \frac{1}{\sqrt{2\sigma}}e^{-\frac{}{2}\left(\frac{y_i-\mathbf{x}_i^\intercal \boldsymbol{\beta}}{\large \sigma}\right)^2} \right) \prod_{k=1}^p \frac{1}{2b_{k0}} e^{-\frac{|\beta_k-\beta_{k0}|}{b_{k0}}} \right) {} = & \sum_{i=1}^n \log \frac{1}{\sqrt{2\sigma}} - \sum_{i=1}^n \frac{1}{2}\left(\frac{y_i-\mathbf{x}_i^\intercal \boldsymbol{\beta}}{\large \sigma}\right) + \\ & {} \sum_{k=1}^{p} \log \frac{1}{2b_{k0}} - \sum_{k=1}^{p} \frac{|\beta_k-\beta_{k0}|}{b_{k0}}\\\\\hline \\ \Large \textrm{Loss} + \textrm{Penalty} \quad\; \normalsize {} = & \sum_{i=1}^n \frac{1}{2}\left(y_i-\mathbf{x}_i^\intercal \boldsymbol{\beta}\right)^2 + \lambda \sum_{k=1}^{p} |\beta_k| \\\hline \end{align*} \]

Shrinkage

  • Regularization is adding a penalty to the loss function to stabilize model estimation
  • Shrinkage is regularization where model coefficients are pulled towards \(0\) in order to stabilize model estimation
    • I.e. prior locations are set at \(\boldsymbol{\beta}_0 = 0\)

  • You have probably seen this plot like before, but now let’s see how it’s made
    • These range from ZERO \((\kappa=0)\) to complete (\(\kappa=1\)) shrinkage

(Handling Sparsity via the Horseshoe)

  • The shape shrinkage is controlled by the prior!
    • \(L_2\): The Cauchy (Student-t with df=1) prior
      • never applies complete shrinkage \((\kappa \not = 1 \implies \beta_k \not = 0)\)
      • This shrinkage profile is similar to the Gaussian prior
    • \(L_1\): The Laplace (or Double Exponential) prior
      • can achieve total shrinkage \((\kappa = 1 \implies \beta_k = 0)\)
      • but it also does not shrink at all \((\kappa=0)\)
    • \(U?\): The Horseshoe can achieve both both
      • complete \((\kappa=1)\) as well as ZERO shrinkage \((\kappa=0)\)

(Bayesian regularization: From Tikhonov to horseshoe)
  • Ridge shrinks both X and Y directions, but neither exactly to \(0\)
  • Lasso shrinks both X and Y directions, one direction exactly to \(0\)
  • Horseshoe shrinks only X or Y directions, one direction exactly to \(0\)

Horseshoe

\[ \huge \begin{align*} \beta_i|\lambda_i,\tau &\sim N(0, \sigma^2=\tau^2\lambda_i^2)\\ \lambda_i &\sim HC_+(1)\\ \\\hline \\ \beta_i | \tau & \sim {} HSP(\tau)\\ \end{align*} \]

Half-Cauchy

\[\Large \begin{align*} \underset{\in \mathbb{R}_+}{\lambda_i} & \sim {} HC_+(\gamma) \\ f(\lambda_i \mid \gamma) & = {} \frac{2\cdot 1_{[\lambda_i>=0]}(\lambda_i)}{\pi \gamma \left[1 + \left(\frac{\lambda_i}{\gamma}\right)^2\right]} \end{align*} \]

Likelihood

The posterior distribution of a normal-normal specification is well-known:

\[ \Large \begin{align*} Y_i|\beta_i &\sim N(\beta_i, \sigma^2_Y=1)\\ \beta_i|\lambda_i,\tau=1 &\sim N(0, \sigma^2=\lambda_i^2)\\\\ \implies \beta_i|Y_i, \lambda_i,\tau=1 & \sim {} N \left(Y_i\frac{1}{\frac{1}{\lambda_i^2} + 1}, \sigma^2=\frac{1}{\frac{1}{\lambda_i^2} + 1} \right)\\ \\ \hline\\ \implies E[\beta_i|Y_i, \lambda_i] & = {} Y_i\frac{\lambda_i^2}{1 + \lambda_i^2} \\ & = {} Y_i\frac{\lambda_i^2}{1 + \lambda_i^2} + 0 \underbrace{\left(\frac{1}{1 + \lambda_i^2} \right)}_{\kappa_i}\\ & = {} Y_i (1 - \kappa_i) + 0 \kappa_i \end{align*} \]

  • \(\kappa_i=1\): complete shrinkage
  • \(\kappa_i=0\): zero shrinkage

Horseshoe Shrinkage Profile

  • But why is the “horseshoe” shrinkage profile “U”-shaped??
  • Let’s derive the distribution of \(\kappa_i\) (based on the distribution of \(\lambda_i\))

Change of Variables

\[ \Large \begin{align*} \underset{\in[0,1]}{\kappa_i} = \frac{1}{1+\underset{\in \mathbb{R}_+}{\lambda_i^2}} & \implies {} \frac{1}{\kappa_i}-1 = \lambda_i^2 \\ & \implies {} \lambda_i = \sqrt{\frac{1}{\kappa_i}-1}\\\\\hline \\ \frac{d \lambda_i}{d \kappa_i} & \underset{rule}{\overset{chain}{=}} {} \frac{1}{2}\left(\frac{1}{\kappa_i}-1\right)^{-\frac{1}{2}} \left(-1\kappa_i^{-2}\right) \\\\\hline \end{align*} \]

\[ \large \begin{align*} & \quad\;\; {} \int_{a\in \mathbb{R}_+}^{b\geq a} f_{_{\lambda_i}}(\lambda_i) d \lambda_i\\ & ={} \int_{\frac{1}{1+a^2} \in [0,1]}^{0 \leq \frac{1}{1+b^2} \leq \frac{1}{1+a^2}} f_{_{\lambda_i}}\left(\sqrt{\frac{1}{\kappa_i}-1}\right) \frac{d \lambda_i}{d \kappa_i} d \kappa_i\\ & ={} - \int^{\frac{1}{1+a^2} \in [0,1]}_{0 \leq \frac{1}{1+b^2} \leq \frac{1}{1+a^2}} f_{_{\lambda_i}}\left(\sqrt{\frac{1}{\kappa_i}-1}\right) \frac{1}{2}\left(\frac{1}{\kappa_i}-1\right)^{-\frac{1}{2}} (-1\kappa_i^{-2}) d \kappa_i\\ & ={} - \int^{\frac{1}{1+a^2} \in [0,1]}_{0 \leq \frac{1}{1+b^2} \leq \frac{1}{1+a^2}} \frac{2}{\pi \left[1 + \left(\sqrt{\frac{1}{\kappa_i}-1}\right)^2\right]} \frac{1}{2}\left(\frac{1}{\kappa_i}-1\right)^{-\frac{1}{2}} (-1\kappa_i^{-2}) d \kappa_i\\ & ={} \int^{\frac{1}{1+a^2} \in [0,1]}_{0 \leq \frac{1}{1+b^2} \leq \frac{1}{1+a^2}} \frac{1}{\pi } \left(\frac{1-\kappa_i}{\kappa_i}\right)^{-\frac{1}{2}} \kappa_i^{-1} d \kappa_i \\\\ & ={} \int^{\frac{1}{1+a^2} \in [0,1]}_{0 \leq \frac{1}{1+b^2} \leq \frac{1}{1+a^2}} \frac{\Gamma(1)}{\Gamma(\frac{1}{2}) \Gamma(\frac{1}{2})} (1-\kappa_i)^{\frac{1}{2}-1} {\kappa_i}^{\frac{1}{2}-1} d \kappa_i\\\\\hline \end{align*} \]

Language or Library?

Reticulate!
  • Loading the reticulate lets me run a python in R!
library(reticulate)
# conda_list()
use_condaenv("PyMC", required=TRUE)
Python!
  • Like this!
  • Here’s the beta distribution implied by the horseshoe prior
    • from the (great) scipy.stats library
import numpy as np
from scipy import stats

support_beta = np.linspace(0,1,100)[1:-1]
pdf_beta = stats.beta(a=1/2, b=1/2).pdf(support_beta)
  • And here we implement the change of variables calculation with python

\[\begin{align*}\\ \Large g_{\kappa_i}(\kappa_i) = f_{\lambda_i} \left(\lambda_i = \sqrt{\frac{1}{\kappa_i}-1}\right) \left|\frac{d \lambda_i}{d\kappa_i}\right| \end{align*}\]

support_cauchy = (1/support_beta-1)**0.5
pdf_cauchy = stats.halfcauchy.pdf(support_cauchy)
abs_dx_wrt_dy = .5*(1/support_beta-1)**(-.5) * support_beta**(-2)
pdf_transformed_cauchy = pdf_cauchy * abs_dx_wrt_dy
R!
  • And here we can bring back the results back in R for interactive use
    • reticulate python sessions in RStudio are still very R-centric
      • E.g., errors for debugging need to be explicitly requested
      • E.g., matplotlib can be output directly, but not plotly
x <- py$support_beta
beta_pdf <- py$pdf_beta
transformed_cauchy_pdf <- py$pdf_transformed_cauchy
tibble(x=x, `f(x)`=beta_pdf) %>% 
  ggplot(aes(x=x, y=`f(x)`, color='Beta')) + 
  geom_line(size=3) + 
  ggtitle(latex2exp::TeX(
          '$\\beta = \\alpha = -\\frac{1}{2}$ Beta Distribution')) +
  geom_line(data=tibble(x=x, y=transformed_cauchy_pdf),
            mapping=aes(x=x, y=y, color='Transformed Cauchy'), 
            linetype='dashed', size=2) +
  scale_color_manual(values=c("black","yellow"))

NCI60
  • We’ll use the NCI60 data, specifically:
    • gene expression for MELANOMA (\(n=8\)) and RENAL (\(n=9\)) cancer cell lines
library(ISLR)
nci60 <- NCI60
  • Which we’ll tentatively normalize, however
    • this choice to address systematic bias needs to be explored more thoroughly
nci60$data[nci60$labs=='RENAL',] %>% t() %>%
  as_tibble() %>% 
  rowid_to_column() %>% rename(gene=rowid) %>%
  mutate(gene=as.factor(gene)) %>%
  pivot_longer(!gene) %>%
  add_column(cell = 'RENAL') -> nci60_RENAL

nci60$data[nci60$labs=='MELANOMA',] %>% t() %>%
  as_tibble() %>% 
  rowid_to_column() %>% rename(gene=rowid) %>%
  mutate(gene=as.factor(gene)) %>%
  pivot_longer(!gene) %>%
  add_column(cell = 'MELANOMA') -> nci60_MELANOMA

nci60_RENAL %>% bind_rows(nci60_MELANOMA) %>%
  mutate(cell=as.factor(cell)) %>%
  mutate(line = paste(cell,name,sep='_')) %>%
  group_by(line) %>% 
  mutate(value_normalized = qqnorm(value, plot.it=FALSE)$x/2) %>% 
  ungroup() -> nci60

nci60 %>% 
  pivot_longer(cols=c(value, value_normalized), names_to="data") %>%
  arrange(line) %>%
  ggplot(aes(y=line, x=value, fill=cell)) + 
  geom_violin() + 
  scale_fill_manual(values=c("purple","blue"),
                    breaks=c("RENAL","MELANOMA")) + 
  facet_grid(cols=vars(data)) 

nci60 %>% 
  pivot_longer(cols=c(value, value_normalized), names_to="data") %>%
  arrange(line) %>%
  ggplot(aes(y=line, x=value, fill=cell)) + 
  geom_violin() + geom_boxplot(width=0.3, fill="yellow") +
  scale_fill_manual(values=c("purple","blue"),
                    breaks=c("RENAL","MELANOMA")) + 
  facet_grid(cols=vars(data)) + lims(x=c(-1,1))

1:1
  • We can easily bring R data into our python session!
print(r.nci60.head())
##   gene name  value   cell       line  value_normalized
## 0    1   V4   0.28  RENAL   RENAL_V4          0.235919
## 1    1  V11   0.27  RENAL  RENAL_V11          0.233051
## 2    1  V12  -0.45  RENAL  RENAL_V12         -0.480740
## 3    1  V13  -0.03  RENAL  RENAL_V13         -0.055993
## 4    1  V14   0.71  RENAL  RENAL_V14          0.591575
print(r.nci60.dtypes)
## gene                category
## name                  object
## value                float64
## cell                category
## line                  object
## value_normalized     float64
## dtype: object
statsmodels!
  • And here we use the (great) statsmodels (python) library!
    • I prefer this display to lm, personally:
      • Cond. No.: (Multicollinearity* Issues Diagnosis
      • Omnibus/Jarque-Bera: Normality Assumptions (Skew/Kurtosis)
      • Durbin-Watson: Homoscedasticity (between 1 to 2 is usually “okay”)
import statsmodels.formula.api as smf

p = 5
kp = r.nci60['gene'].apply(lambda x: x in [str(i) for i in range(1,p+1)])
data = r.nci60[kp].copy() 
data['gene'] = data['gene'].astype(str).astype('category')

results = smf.ols('value_normalized ~ gene*cell', data=data).fit()
results.summary()
## <class 'statsmodels.iolib.summary.Summary'>
## """
##                             OLS Regression Results                            
## ==============================================================================
## Dep. Variable:       value_normalized   R-squared:                       0.311
## Model:                            OLS   Adj. R-squared:                  0.229
## Method:                 Least Squares   F-statistic:                     3.770
## Date:                Tue, 09 Feb 2021   Prob (F-statistic):           0.000590
## Time:                        09:17:48   Log-Likelihood:                -50.797
## No. Observations:                  85   AIC:                             121.6
## Df Residuals:                      75   BIC:                             146.0
## Df Model:                           9                                         
## Covariance Type:            nonrobust                                         
## ===========================================================================================
##                               coef    std err          t      P>|t|      [0.025      0.975]
## -------------------------------------------------------------------------------------------
## Intercept                  -0.0438      0.166     -0.265      0.792      -0.374       0.286
## gene[T.2]                   0.0648      0.234      0.277      0.783      -0.402       0.531
## gene[T.3]                   0.0537      0.234      0.229      0.819      -0.413       0.520
## gene[T.4]                  -0.6879      0.234     -2.938      0.004      -1.154      -0.222
## gene[T.5]                   0.1251      0.234      0.534      0.595      -0.341       0.592
## cell[T.RENAL]              -0.0440      0.228     -0.193      0.847      -0.497       0.409
## gene[T.2]:cell[T.RENAL]    -0.4075      0.322     -1.266      0.209      -1.049       0.233
## gene[T.3]:cell[T.RENAL]    -0.0634      0.322     -0.197      0.844      -0.704       0.578
## gene[T.4]:cell[T.RENAL]     0.0596      0.322      0.185      0.854      -0.581       0.701
## gene[T.5]:cell[T.RENAL]     0.0036      0.322      0.011      0.991      -0.637       0.645
## ==============================================================================
## Omnibus:                       10.969   Durbin-Watson:                   2.032
## Prob(Omnibus):                  0.004   Jarque-Bera (JB):               11.591
## Skew:                           0.731   Prob(JB):                      0.00304
## Kurtosis:                       4.064   Cond. No.                         15.7
## ==============================================================================
## 
## Notes:
## [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
## """
PyMC!
  • I’ve liked PyMC quite a lot
    • PyMC3 uses theano
    • PyMC4 will use TensorFlow
      • will rely on/take advantage of the TensorFlow.probability project
  • Generally PyMC seems to be a very active and well-supported community
  • I have not worked with stan but that also seems to be a very good choice
    • though I slightly prefer PyMC’s planned integration with TensorFlow
      • i.e., the latest and hugely supported automatic differentiation tools
      • as opposed to using a custom, “proprietary” research computation engine
p = 275
kp = r.nci60['gene'].apply(lambda x: x in [str(i) for i in range(1,p+1)])
data = r.nci60[kp].copy() 
data['gene'] = data['gene'].astype(str).astype('category')

kp = [str(i) for i in range(1,51)]
for i in range(51,p+1):
  xy = data[data.gene==str(i)]
  x = xy[xy.cell=='RENAL']['value_normalized']
  y = xy[xy.cell=='MELANOMA']['value_normalized']
  if stats.ttest_ind(x, y, equal_var=True).pvalue < 0.01:
    kp += [str(i)]
 
data = data[data.gene.apply(lambda x: x in kp)].copy()
data['gene'] = data['gene'].astype(str).astype('category')
import pymc3 as pm
# https://stackoverflow.com/questions/51761599/cannot-find-stdio-h
# https://stackoverflow.com/questions/58278260/cant-compile-a-c-program-on-a-mac-after-upgrading-to-catalina-10-15
# https://stackoverflow.com/questions/58278260/cant-compile-a-c-program-on-a-mac-after-upgrading-to-catalina-10-15/58349403#58349403
# https://github.com/rodluger/starry/issues/261
import pandas as pd

X_gene = pd.get_dummies(data['gene'])
X_cell = pd.get_dummies(data['cell'])
X_gene_RENAL = X_gene.values * X_cell[['RENAL']].values
X_gene_MELANOMA = X_gene.values * X_cell[['MELANOMA']].values
X_gene = X_gene.values


with pm.Model() as model_horseshoe:
    
    tau = 0.05
    lambda_0 = 1
    lambdas = pm.HalfCauchy('lambdas', beta=lambda_0, 
                            shape=(X_gene_RENAL.shape[1]))
    sd_gene = pm.HalfNormal('sd_gene', sd=1, shape=(X_gene.shape[1]))

    beta_gene = pm.Normal('beta_gene', mu=0, sd=1, 
                          shape=(X_gene.shape[1]))
    beta_gene_RENAL_stn = pm.Normal('beta_gene_RENAL_stn', mu=0, sd=1, 
                                shape=(X_gene_RENAL.shape[1]))
    # https://twiecki.io/blog/2017/02/08/bayesian-hierchical-non-centered/
    beta_gene_RENAL = pm.Deterministic("beta_gene_RENAL", 
                                       beta_gene_RENAL_stn*tau*lambdas)
    
    gene_expression = pm.Normal('gene_expression', 
                                mu = X_gene@beta_gene +\
                                     pm.math.dot(X_gene_RENAL,beta_gene_RENAL),
                                     # https://docs.pymc.io/api/math.html#math
                                sd = X_gene@sd_gene,
                            observed=data['value_normalized'])

with model_horseshoe:
    posterior_horseshoe = pm.sample()    
## █
## Auto-assigning NUTS sampler...
## Initializing NUTS using jitter+adapt_diag...
## Multiprocess sampling (4 chains in 4 jobs)
## NUTS: [beta_gene_RENAL_stn, beta_gene, sd_gene, lambdas]
## Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 490 seconds.
## There were 238 divergences after tuning. Increase `target_accept` or reparameterize.
## There were 136 divergences after tuning. Increase `target_accept` or reparameterize.
## There were 212 divergences after tuning. Increase `target_accept` or reparameterize.
## There were 150 divergences after tuning. Increase `target_accept` or reparameterize.
## The estimated number of effective samples is smaller than 200 for some parameters.
pm.model_to_graphviz(model_horseshoe).render("graphname", format="png")
## 'graphname.png'

with pm.Model() as model:
    
    sd_gene = pm.HalfNormal('sd_gene', sd=1, shape=(X_gene.shape[1]))

    beta_gene = pm.Normal('beta_gene', mu=0, sd=10, 
                          shape=(X_gene.shape[1]))
    beta_gene_RENAL = pm.Normal('beta_gene_RENAL', mu=0, sd=10, 
                                shape=(X_gene_RENAL.shape[1]))
                                
    gene_expression = pm.Normal('gene_expression', 
                                mu = X_gene@beta_gene +\
                                     X_gene_RENAL@beta_gene_RENAL,
                                sd = X_gene@sd_gene,
                            observed=data['value_normalized'])
    
with model:
    posterior = pm.sample()    
## █
## Auto-assigning NUTS sampler...
## Initializing NUTS using jitter+adapt_diag...
## Multiprocess sampling (4 chains in 4 jobs)
## NUTS: [beta_gene_RENAL, beta_gene, sd_gene]
## Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 139 seconds.
#1-9/(9+ (posterior_horseshoe['sd_gene']/posterior_horseshoe['lambdas'])**2)

post_hs_beta_gene_RENAL = posterior_horseshoe['beta_gene_RENAL'].mean(axis=0)
post_hs_CV = (posterior_horseshoe['beta_gene_RENAL']/
              posterior_horseshoe['sd_gene']).mean(axis=0)

post_CV = (posterior['beta_gene_RENAL']/
           posterior['sd_gene']).mean(axis=0)
post_beta_gene_RENAL = posterior['beta_gene_RENAL'].mean(axis=0)
ggplot(data=tibble(`Original Inverse CoV` = py$post_CV,
                   kappa = 1 - (py$post_hs_beta_gene_RENAL/
                                py$post_beta_gene_RENAL))) +
  geom_point(aes(x=`Original Inverse CoV`, y=kappa)) + lims(y=c(0,1)) -> shinkage_ratio

ggplot(data=tibble(`Original Inverse CoV` = py$post_CV, 
                   `Horseshoe Inverse CoV` = py$post_hs_CV) )+
  geom_point(aes(x=`Original Inverse CoV`, y=`Horseshoe Inverse CoV`)) +
  geom_smooth(aes(x=`Original Inverse CoV`, y=`Horseshoe Inverse CoV`),
              formula='y~x', method='loess') +
  geom_abline(intercept=0, slope=1) -> shrinkage_scatter

grid.arrange(grobs=list(shinkage_ratio, shrinkage_scatter), nrow=1, aspect=TRUE)